Over the past few years, the artificial intelligence race looked like a story about infrastructure. Which company can build the biggest, most power-hungry data center, stock it with the most Nvidia GPUs and spend the most money? OpenAI, Amazon, Google, xAI — they’re all in a competition to build industrial-scale computing factories just to run the most powerful AI models. But it looks like developer Dan Woods just upended that story by running a data-center AI model on MacBook.
And that could mean Apple wins the AI race after all.
Developer runs data-center AI model on MacBook
Woods announced on X this week that he managed to get Qwen3.5-397B — a cutting-edge “frontier” AI model that normally requires a server rack full of specialized hardware — running on a 48GB MacBook Pro with an M3 Max chip. The model takes up 209GB (120GB when compressed) on disk, far exceeding what any laptop could hold in working memory. Yet Woods got it running at over 5.5 tokens per second. That’s quite a shocking accomplishment with a consumer laptop — especially one from a company with a reputation for bringing up the rear on AI development.
To understand why this is remarkable, some context helps. Frontier AI models — the class of models that powers ChatGPT, Claude and Gemini at their most capable — are typically enormous. Running them requires loading their billions of parameters into fast memory. A 48GB MacBook has nowhere near enough RAM to do that for a 209GB model.
So how did Woods pull it off?
The secret: Apple’s own research
The key was a 2023 research paper Apple quietly published called LLM in a Flash: Efficient Large Language Model Inference with Limited Memory. The paper tackles the challenge of running LLMs that exceed available memory by storing model parameters in flash storage and streaming them into RAM on demand — guided by an inference cost model that minimizes data transfer and reads data in larger, more efficient chunks.
In other words, Apple’s engineers had already figured out theoretically how to run huge AI models on devices with limited RAM. The technique takes advantage of the fact that modern Macs use fast NVMe SSD storage — and crucially, Apple silicon’s unified memory architecture. It lets the CPU, GPU and memory work in unusually tight coordination.
Woods combined what he learned from the paper with another insight. The Qwen model he chose is a “Mixture of Experts” (MoE) architecture. MoE models only activate a subset of their parameters for each token generated. That means the active weights can be streamed in from storage rather than all held in memory at once, according to developer Simon Willison, who wrote about Woods’ work. Woods dropped the number of active experts per token from 10 to 4. That compromise preserved most of the model’s quality while dramatically reducing memory demands.
He vibe-coded it with AI

Image: @danveloper on X.com
Here’s another twist that makes this story very 2026: Woods didn’t write all this low-level optimization code by hand. He fed Apple’s paper to Claude Code and used an autoresearch pattern to run 90 automated experiments, producing highly optimized MLX Objective-C and Metal code, the low-level graphics and compute language that runs directly on Apple silicon.
The result is open-source on GitHub, along with an AI-written technical paper describing the experiments in detail.
Data-center AI model on MacBook: Why it matters for Apple
The implications for Apple’s competitive position in AI are significant. The dominant narrative that Apple is behind — that Siri is a joke compared to ChatGPT, that Apple Intelligence is underwhelming, that the company missed the generative AI wave — could be misleading. Woods’s experiment suggests Apple may have quietly built the right hardware all along.
Apple silicon’s unified memory architecture lets CPU and GPU share the same high-bandwidth memory pool. And that looks like precisely the design needed for the flash-streaming technique Apple’s own researchers described. No other mainstream laptop platform has this. So MacBook Pro isn’t just a laptop that can run AI on the side. It may be the most capable personal AI computer on the market.
While competitors race to build billion-dollar data centers, the most powerful AI model you can run might soon be the one already sitting in your bag. Apple’s chip lead, combined with techniques like these, could make local AI on Mac — private, fast and free from cloud subscriptions — a genuine reality far sooner than anyone expected.
As Willison noted, the quality tradeoffs are still being evaluated. But we can’t overstate the breakthrough in simply getting it running. The AI race might not be won in a data center after all.